Performance tuning policies for application level fault tolerance in distributed object systems

نویسندگان

  • Theodoros Soldatos
  • Nantia Iakovidou
چکیده

In distributed object systems, application level fault tolerance is often attained by appropriate object replication policies. These policies aim at increasing the exhibited service availability by masking potential faults that do not recur after recovery. Existing middleware support infrastructures allow customizing object replication properties. However, since fault tolerance has a significant impact in the perceived service performance, there is a need for a suitable quantitative design technique, which allows comparing different replication policies by trading off the caused overhead cost against the achieved fault-tolerance effectiveness. We are also interested in taking into account different concerns in a combined manner (e.g. fault tolerance combined with load balancing and multithreading). This paper presents experimental evidence for the most important performance tradeoffs revealed in a simulation-based study. We considered different cases of object request loss behavior for the faulty objects, as well as, a number of request-retry strategies. The experiments took place in two different application workload levels for varied fault detection settings. We provide results for the combined effects of the studied replication policies with two specific load-balancing strategies. The presented results constitute a valuable experience report for performance tuning object replication policies for application level fault tolerance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simulation Metamodeling for the Design of Reliable Object Based Systems

Replication is a suitable approach for the provision of fault tolerance and load balancing in distributed systems. Object replication takes place on the basis of well-designed interaction protocols that preserve object state consistency in an application transparent manner. The published analytic performance models may only be applied in single-server process replication schemes and are not sui...

متن کامل

An approach for adaptive fault-tolerance in object-oriented open distributed systems

Effective fault-handling in emerging complex distributed applications requires the ability to dynamically adapt resource allocation and faulttolerance policies in response to possible changes in environment, application requirements, and available resources. This paper reports an effort on design and implementation of an adaptive fault-tolerance middleware (AFTM) using a CORBA-compliant object ...

متن کامل

The performance of independent checkpointing in distributed systems

This paper describes performance measurements of an implementation of independent checkpointing in a network of workstations. Independent checkpointing is a simple technique for providing fault tolerance in distributed system, Because processes do not coordinate during checkpointing, this technique has a low run-time overhead. To avoid the classical domino effect, our implementation relies on a...

متن کامل

Towards a Performance Model for Special Purpose ORB Middleware

General purpose middleware has been shown effective in meeting diverse functional requirements for a wide range of distributed systems. Advanced middleware projects have also supported single quality-of-service dimensions such as real-time, fault tolerance, or small memory footprint. However, there is limited experience supporting multiple quality-of-service dimensions in middleware to meet the...

متن کامل

Building Dependable Distributed Objects with the AQuA Architecture 1

Providing fault tolerance to distributed applications is an important problem. The flexibility that software can offer makes it a natural choice for implementing a significant portion of the fault tolerance of dependable distributed systems. Furthermore, when the dependability requirements change during the execution of an application, the fault tolerance approach must be adaptive in the sense ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Comput. Meth. in Science and Engineering

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2006